-
-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add and derive deserialize_from #1094
Conversation
Oh also obviously BLATANTLY MISSING: TESTS. ANY TESTS. |
Any tips on testing this kind of thing would be appreciated. My local experiments have mostly consisted of asserting that serializing and deserializing round-trips correctly. |
Currently grinding through the serde test suite, lots of little derive bugs. Starting to regret changing a bunch of things from |
b86bbeb
to
1a57e5d
Compare
Pushed fixes:
One thing I didn't realize until now is that deserialize_from kinda doesn't make any sense when doing "remote" impls? |
serde/src/de/impls.rs
Outdated
@@ -66,13 +68,37 @@ impl<'de> Visitor<'de> for BoolVisitor { | |||
} | |||
} | |||
|
|||
struct BoolFromVisitor<'a>(&'a mut bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think primitive types need this specialization. The pointer indirection will probably be less performant than the default implementation. I'm easily convinced otherwise by benchmarks ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I implemented this, but I seem to recall getting wins from doing this. Gonna start profiling tomorrow.
E: Error, | ||
{ | ||
match str::from_utf8(v) { | ||
Ok(s) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forward to visit_str
instead of repeating the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deserialize() impl doesn't, and this code seems clearer without it. (happy to change if you disagree)
serde/src/de/impls.rs
Outdated
where | ||
D: Deserializer<'de>, | ||
{ | ||
// Some enum is opaque, can't do anything good here :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't you do
match *self.0 {
Some(ref mut data) => data.deserialize_from(deserializer),
None => Some(T::deserialize(deserializer)?),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but hard to say if this is justified. Very workload-specific.
where | ||
A: SeqAccess<'de>, | ||
{ | ||
$clear(&mut self.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the real optimization here to not clear, but overwrite all already existing elements, if the new sequence has more elements, append, if it has less, drop the leftover ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do this for BTreeSet and BinaryHeap, so I just did the generically correct thing. Seems like it could be left as future work to split this up and further optimize it?
I have fallen behind on PRs so I haven't had a chance to review the implementation yet, but -- I would be interested in exploring some ways to modularize the serde and serde_derive crates better so that large but uncommonly required features like this one can be delivered in their own crate. The Maybe this means factoring a lot of the serde_derive code into a separate pre-1.0 helper crate that is depended on by serde_derive, serde_derive_state, and a deserialize_from derive crate. Would you be interested in figuring out a structure that would work for deserialize_from as part of getting this integrated? Otherwise would it be possible to provide deserialize_from in its own crate somehow, even if it means some code duplication with serde_derive? |
I don't really see how deserialize_from could be seperated out, realisitically? It's part of the trait definition. |
You can make a |
How would that work? It can't be blanketed, because actual type-specific code needs to be emitted. |
Okay I can see how shipping this outside of the Deserialize trait definition would be problematic. I am on board with accepting this as long as it does not affect compile times for anyone not using deserialize_from (basically the same concern as for clone_from). As is, when I tried this it raises compile time for the serde crate by 13% (5.9 seconds to 6.7 seconds) which is a bit much for something that few people will use. Worse, Deserialize impls already compile very slow (serde-rs/json#313) so something like 2x more generated code in every impl would not be nice. Could you propose what you think is the best way to feature gate this? Up to you how you want to get this merged. If it is easier, we can merge the serde pieces up front and follow up in one or more other PRs with the serde_derive pieces as those are written and tested. One thing I don't like here is all the deserialize_from impls that are conceptually no different from the default implementation. For example the bool one: fn visit_bool<E>(self, v: bool) -> Result<(), E> {
*self.0 = v;
Ok(())
} Could you share some benchmarks that show these outperform the default deserialize_from? Is there an easy way to tell which types benefit from a custom deserialize_from? Or is a custom deserialize_from always preferable? |
Ok finally got a chance to double check primitive stuff. Looks like, at least on webrender, removing the primitive impls is neutral, or possibly even profitable (the deserializer went from 7.1% to 6.6% of backend time; probably noise). I'll take them out, then. |
1e49cf2
to
108764b
Compare
This adds a new "deserialize_from" feature (default off) that opts into deriving deserialize_from with #[derive(Deserialize)].
108764b
to
34936be
Compare
Ok I refactored out all the deserialize_from derivation code out so it could be feature'd out. Ultimately we weren't sharing a ton of code, so it's not a huge loss. Writing it that way helped me incrementalize the development process for myself better, but that doesn't matter now. I also did a kinda naive addition of deserialize_from to the test suite. |
Webrender, and therefore gecko, will likely be moving forward with a soft fork of serde in the next few days with these patches, plus the additional patch found on https://github.com/Gankro/serde/tree/deserialize_from_enums3 Even if you aren't ready to make a call on merging this, we'd certainly appreciate more thorough review. :) |
I'd be on board with merging this PR as it is if we make the |
That makes sense to me. |
Wait, would that also put the trait method behind the feature? |
Also CC #855 |
Hmm right, that could be messy with derives and such :/ |
The custom one was functionally identical to the default implementation given by the Deserialize trait. If someone has benchmarks that the custom one performs better, we can put it back.
The naming here isn't great but no need to change it in this PR.
I started making progress on the review -- pushed a few fixup commits as I go. Some remaining items:
|
|
The CSV reader and JSON stream deserializer are interesting because they are APIs designed for deserializing lots of instances of the same type that are processed one at a time -- seems like an obvious candidate for deserialize_from. CSV is why we filed #855 in the first place so we may as well be certain that we are addressing their use case in the best way. There aren't really benchmarks of Serde by itself since nothing interesting happens without a data format, but JSON benchmarks live here and Bincode benchmarks live here. I wouldn't bother with either of those, not looking for anything fancy just a 30 line thing I can run and see an improvement. I would do this myself but I am about to call it a night. 😴 |
The actual deserialize work is WIP, because I probably need to refactor it so it can be feature-flagged off. I think the first two commits are "good to go", though.
One thing missing from this design is deriving deserialize_from for exactly C-like enums, but that can obviously be done later.
Pushing this up now because my brain is melting trying to do the non-C-like enum work I need for webrender, and so I'm gonna take a bit of a "break" by dealing with review complaints.